Modules of human micro-RNA co-target network 



Mahashweta Basu^'1^,Nitai P. Bhattacharyya^,P. K. Mohanty^ 

^Theoretical Condensed Matter Physics Division, 

Saha Institute of Nuclear Physics, 1/AF Bidhan Nagar, Kolkata 700064, India. 
^Crystallography and Molecular Biology Division, 

Saha Institute of Nuclear Physics, 1/AF Bidhan Nagar, Kolkata 700064, India. 
E-mail: Vahashweta.basuOsaha. ac . in 

Abstract. Human micro RNAs (miRNAs) target about 90% of the coding genes and form 
a complex regulatory network. We study the community structure of the miRNA co-target 
network considering miRNAs as the nodes which are connected by weighted links. The weight 
of link that connects a pair of miRNAs denote the total number of common transcripts targeted 
by that pair. We argue that the network consists of about 74 modules, quite similar to the 
components (or clusters) obtained earlier [Online J Bioinformatics, 10,280 ], indicating that 
the components of the miRNA co-target network are self organized in a way to maximize the 
modularity. 



1. Introduction 

Micro RNAs are a class of small single stranded non-coding RNAs, about 20 to 22 base 
long, which interfere with the translation of messenger RNAs (mRNAs) by binding to their 
3' untranslated regions (UTR) p!|. Several computational methods [2J have been developed for 
predicting the mRNA transcripts which are possible targets of a particular miRNA. For example, 
711 nucleotide sequences are predicted as miRNAs [3] of human; their possible targets, (34525 
in total) are listed in the mirBASE database [1]. It has been proposed on the basis of theoretical 
analysis that as large as 90% human genes are targets of miRNA [3]. Regulation of coding genes 
by miRNAs in combination are also experimentally validated [6|. 

The abundance of miRNA and their targets provide enormous combinatorial possibilities for 
regulation. Combinatorial regulation of genes by transcription factor (TF) and miRNAs provides 
higher complex programs [7J. Recently, taking TFs as important mediators of miRNA-initiated 
regulatory effects, it was shown [8] that the underlying network is significantly associated with 
multicellular organismal development, cell development and cell-cell signaling. Combinatorial 
effect of miRNA modules [9] has been observed in tumor tissues or cell lines. This observation 
suggests a combinatorial effect of the module associated miRNAs on target gene regulation in 
selective tumor tissues or cell lines. Synergistic network [lOj of miRNAs reveals that miRNA 
modules associated with diseases are significantly different from modules of miRNAs that does 
not involve in disease. Possibility of co-regulation of two or more miRNAs in context of gene 
expressions and relevant biological functions is, however, least explored. 

Recently Mookherjee et. al. [llj have analyzed the miRNA co-target network (MCN) of 
Homo sapiens, which indicate that several group of miRNAs (so called clusters) provide most 
essential regulations. This topological analysis of miRNA network revealed that about 70 clusters 



of miRNAs co-target the genes, which are involved in specific pathways. For several clusters, 
all miRNAs belonging to the cluster are found to be maximally expressed in a specific tissue. 
Further studies [12], indicate that the clusters are also disease specific. Reorganizing miRNAs 
into such groups (clusters) helps in identifying cooperative activity of miRNAs. In fact, from 
these analysis one can predict that, "if one miRNA from a particular cluster is involved in a 
specific biological pathway or cellular function, the other miRNAs belonging to the same cluster 
are likely to be involved in the same disease, pathway or function". 

Detection of communities, groups, components or clusters have been a focus of recent interest 
in context of complex networks. Networks like the world wide web [13] . the metabolic network 
[14] . the social network [TS], protein protein interaction network [TB] etc. do possess community 
structures, meaning the vertices tend to divide into groups, with dense connections within 
the groups and sparse connection existing among the groups. These communities act as the 
functional units of the network; for example 'ATP synthesis', 'DNA processing', and 'cell cycle 
control' are well known [17] functional modules of yeast protein-protein interaction network. 
Evidently, the functional properties of an entire network is quite different from their properties 
at community level. 

In this article, we study the community structures (modules) of miRNA co-target network of 
human and compare them with the components (clusters) of miRNAs obtained earlier Since 
the components of a network are the only disjoint subgraphs, it is expected that the community 
structures can be better represented by the modules. This is explained schematically in Fig. [H 
where the network has 3 components and 6 modules. 

In section [2] we briefly review the relevant features of the miRNA co-target network of human 
and its components (clusters) . In section [3] we apply the modularization method introduced by 
Newmann [T8] to analyze this miRNA co-target network and compare the resulting modules 
with the clusters obtained earlier[ll]. Finally, conclusions are given in section 01 




Figure 1. Component versus modules : Components (or clusters) are disjoint sub-graphs of 
a network (they do not have any common link), whereas modules may have relatively fewer 
connections among them. The above network has 3 components namely (a), (b) and (c) with 
each component consisting of 1, 3 and 2 modules respectively. Component (b) has three modules 
consisting of 4, 9 and 5 vertices, similarly component (c) consists of two modules comprising of 
3 and 5 vertices. 



2. Clusters of miRNAs in the co-target network 

Let us briefly revisit the main ideas and resuhs of Ref. [TT] to understand the 
construction, topology and components of human miRNA co-target network consisting 
of 711 miRNAs and their 34525 predicted targets obtained from the miRBase database 
[http://microrna.sanger.ac.uk/, version 10). For convenience, miRNAs are given arbitrary 
identification number m = 1,2, ... i, ... M , where M = 711. Further, the miRNA co-target 
network was constructed by considering miRNAs as the nodes and joining every pair of miRNAs 
having one or more common targets with a link. The total number of co-targeted transcripts 
Cij of miRNAs i and j is taken to be the weight of the corresponding link. Clearly, the resulting 
adjacency matrix C is symmetric (with diagonal elements Ca = 0). 

Mookherjee et. al. jllj have proposed an elegant method for finding the clusters of miRNAs. 
Since substantially large number of miRNA pairs have only few co-targets, the links between 
them have small weights, and can be erased to obtain a simplified network. Let Nq be the 
number of components of the network when all the links having weight less than q are erased. 
Thus, the network breaks into smaller disjoint subgraphs (components) with rate which 
is maximum at q = q*. It was argued that among all the subgraphs of the co-target network 
obtained at q* , the largest one is the most important; miRNAs belonging to this subgraph are 
found to down regulate expression of genes involved in several genetic diseases. 

To be specific, the human miRNA co-target network breaks into Ng = 166 subgraphs at 
q = 103, where the largest subgraph called G contain 477 miRNAs. To determine how miRNAs 
are organized within the subgraph Q, q is increased further. At q = 160 the subgraph Q breaks 
up into 70 small clusters (the subgraphs having two or more miRNAs) and 147 independent 
miRNAs. Out of total 70, 18 clusters arise from the seed sequenc^ll similarities and 11 clusters 
are organized into the same genomic region (5 inter-genomic clusters also show seed sequence 
similarities). Most of the clusters are found to be either pathways, tissues and/or diseases 
specific. 

In the following section we aim at investigating the modular structure of miRNA co-target 
network. Figure [1] schematically describes, why a network is better represented by its modules 
than its components (disjoint subgraphs). 

3. Modular structure of miRNA netw^ork 

The identification of community structure is one of the many challenging problems in various 
scientific field. A large variety of community detection techniques have been developed based on 
centrality measures, link density, percolation theory etc. Recently, Newman et. a/.[18j proposed 
a method of finding community structure of a network based on maximization of the modularity. 
This method is further generalized to include weighted networks |19j. 

The most obvious way of finding groups in a network is to minimize the number of edges 
connecting the groups. Simply rearranging the network, such that only few edges exist between 
the communities, is not enough. Rather one must rearrange it in a way that communities are 
connected with fewer than expected edges. One can associate a score called modularity Q ^0] 
for each possible partition of a network. Q is defined up to a multiplicative factor as the number 
of edges present within the groups minus the expected number in an equivalent random network. 
Since positive values of Q indicates possible presence of community structure, one need to look 
for a partition for which the modularity is preferably large and positive. 

A good partition of a network can be obtained by maximizing the modularity index Q defined 
as follows. Let us consider a network with n vertices labeled by i = 1, . . . n, and m links. The 
corresponding adjacency matrix is Aij. Let the degree of each vertex i is ki = J2j^ijj thus 
m = If the network is to be partitioned into two groups, one associates a quantity 



^ The nucleotides 2 - 7 of the miRNAs are called seed sequences. 
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Figure 2. When all the links having weight less than q are erased, the miRNA co-target network 
breaks into Nq distinct components comprising of Hq modules. The main figure compares fig 
(solid line) with the Nq (circles). The inset shows that the is maximum at q = 103, which 
is same as the value obtained from earlier 

which takes a value Si = +!(— 1) if vertex i belongs to group 1 (group 2). 
Correspondingly the modularity is given by 



where kikj/2m is the expected number of links between i and j, if edges were placed at random. 
The term (1 + SiSj) is (1) if vertices i and j belong to different (same) group; this assures that 
Q is maximum when two groups are connected by smaller than expected number of links. In 
the following we apply this procedure to obtain modular structure of MCN. 

MCN is a undirected weighted network, where the weight of the link Cij corresponds to the 
number of transcripts being co-targeted by the concerned pair of miRNAs i and j. The diagonal 
elements are taken Cu = 0, as usual. It has been pointed out in Ref. that the weights vary 
widely between 1 to 1253, indicating that most of the links with small weights can be erased to 
obtain a simpler network. However, the connectivity of the network changes when links having 
weight less than a predefined value q are erased. Taking the adjacency matrix C, defines as 



Mookherjee et. al. [11] have calculated the number of components Nq by varying q. Since, 
this adjacency matrix C is unweighted (as it keeps the information of connectivity ignoring the 
actual weights) one can apply the idea of modularity maximization [18] to detect the communities 
or modules present there. Let ^q be the total number of modules of C^. Clearly fiq > Nq, as each 
component can either have one module, i.e. itself, or it can break into two or more modules. In 
Fig. [2] we have compared Hq, obtained from modularization methods, with the components Nq 
obtained earlier. It is evident that Hq ~ Nq; a negligible small positive difference is not visible 
in the figure. This brings us to conclude that the components of the network are self organized 
in a way that modularity (given by Eq. ([I])) is maximized. 

We also find that is maximum at q = 103, which is the value obtained from earlier 
|11] . Now let us have a closer look at the size of the components and that of the modules 
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Figure 3. Components of Q (which has 477 miRNAs) : Only the clusters (components of 
size larger than 1) are shown. Each of the large clusters with size (7,8,9, 14, 16,31,47) appear 
once. The other 63 clusters comprises of (2; 29), (3; 21), (4; 2), (5; 7), (6; 2), (11; 2), where (m;n) 
denotes that cluster of size m appears n times. 
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Table 1. Comparison of number of modules of the miRNA network with the number of 
components, aX q = q* = 103. 



obtained at q* = 103. This is listed in Table-I. Note, that the modularity maximization 
algorithm organizes the network into several small modules and few moderate size modules 
as 26,37, 57,98, 101. Whereas in terms of components the network breaks into few clusters of 
small sizes {e.g. 2, 3, 4, 5, 7) along with a giant cluster [Q) of size 477 |11] . Evidently at q* = 103, 
MCN has one distinctly large component containing 477 miRNAs, compared to the moderate 
size modules those appear with competitive sizes (101 and 98). The largest component must 
have been broken into these smaller modules. Thus, as far as 'identifying a large set of relevant 
miRNAs' (one like Q) is concerned, one can reliably consider the component Q as the optimal 
set of miRNAs, which co-regulate the gene expressions. Further, to understand how miRNAs 
are organized within we calculate its modules by taking q = 160, which is the same value of 
q used in '11' , to obtains the clusters (in total 70). All the components of Q having two or more 
miRNAs (referred to as clusters), are shown in Fig. [3] in decreasing order of their sizes. The 
first five, named as (a) to (e) have 47,31, 16, 14 and 9 miRNAs respectively. 

It would be interesting to look at the community structure of Q at this value of g = 160. Using 
the modularization algorithm |18j , we find that Q contains 72 modules (total 330 miRNAs) and 
147 single miRNAs. Whereas in terms of components, Q had 70 clusters (total 330 miRNAs) and 
147 independent miRNAs [TTj. The detailed study of modules reveals that only two of the 70 
clusters are broken into smaller modules : cluster (a) in Fig. Owith 47 miRNA, has two modules 
ai and 02 of size 34 and 13 respectively, cluster (b) with 31 miRNAs, breaks into two modules 
hi (22 miRNAs) 62 (9 miRNAs). Such modular structures of (a) and (b) were not apparent in 
Fig. [3l we redraw these graphs keeping all miRNAs in same module close to each other. The 




Figure 4. The cluster (a) and (b) of Fig. [3] are redrawn to visualize the the community 
structure obtained through modularization. Clearly (a) which contains 47 miRNAs has two 
modules ai (34 miRNAs) and 02 (13 miRNAs), and (b) has two modules bi (22 miRNAs) and 
62 (9 miRNAs). 

resulting graph (Fig. H] (a) and (b)) clearly show the existence of modular structures. 

In summary, the community structure in these networks are very similar to the components 
(or clusters) obtained earlier in [TT]. Only few large components show further small sub- 
structures, indicating that the existing components of MCN are already optimally modularized. 
Implications of these results will be discussed in section [H 

Few comments are in order. It is quite evident that cluster (c) in Fig. [3l containing 16 
miRNAs, might have sub-structures of size 11 and 5, which could not be obtained when the 
modularization algorithm is applied to the un- weighted graph Q containing 477 miRNAs. In 
this analysis, the actual weights were ignored, i.e., all links having weight more than 160 are 
considered identical irrespective of their actual weight. When we keep these weights and use the 
modified version of the algorithm [19J, that works for weighted networks, the cluster (c) shows 
the predicted substructures. In addition, some other modules, such as 02 which has 34 miRNAs 
also show further sub-structures of size 30 and 4. These four nodes, turns out to be those shown 
in the left side of 02 in Fig. HI 

It appears that Newman's algorithm, both for un-weighted and weighted network, provides 
only the sub-structures of large components. This is because, modularity of the network is 
inversely proportional to the total number of links ( see Eq. ([I])). Thus, the total modularity 
of a network with many components is not substantially altered by re-structuring the small 
components into smaller sub-structures. It is only, the re-structuring of larger components 
which can change the modularity appreciably. To overcome this difficulty, one must find modular 
structures of individual components, instead of looking at the community structure of the whole 
network. 

4. Conclusion 

To our surprise, the community structure of human miRNA co-target networks is very similar 
to the existing components or clusters. Only few large components show smaller sub-structures. 



Most of the components do not show any further substructures, indicating that the miRNA 
co-target network inherently consists of optimahy modularized structures. It is quite possible 
that, during the evolution of miRNAs, first the the modular structures are formed, optimized 
and then they join with other modules to provide essential regulation for complex life structures. 
Further study in these directions is required to verify such hypothesis. 
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